Reflections on Syndicating my Fortune Cookies as Atom

Shlomi Fish on 2008-07-15T13:26:31

Another technical entry, on my fortune cookies' FortuneXML work again. When I last left you, I was able to transform the XML sources of the fortune cookies to XHTML and to plaintext, and ran into a nasty stack smash that prevented me from uploading the module to the CPAN.

Then, I decided that now that each fortune had its own unique URL, and that people didn't have the time to regularly see what has changed, it was high time to get a web-feed syndication for it. I decided to use XML-Feed directly, and to generate an Atom feed. (So far I only did RSS using XML-RSS).

It seemed logical that I'd simply go over all the individual id="" attributes in the XML files I maintained, and see which ones were added. Then I'll generate a feed containing the most recent 20 or so. However, the XML does not contain the dates where they were added. So I decided that for simplicity I'll record the dates all the IDs were added into a YAML file, which in turn will be updated there. (A database of some sort may be better eventually.)

Now, how to find the most recent 20 dates from the YAML and the XMLs? One option would be to collect all the dates and sort them, but that seemed wasteful to me. Recalling my Computer Science Data Structures studies I remembered that I could use a priority-queue for that, and remembered there was a module for that on the CPAN. A search for "Priority Queue" or "PQ" yieleded nothing, but I then searched for Heap and found Heap.pm, which implements several Heaps that can be used as a priority queue.

Heap is a very impressive module. I remember that back when I studied at the Technion, it used to be the only usable sourceware (and open-source) implementation I found of Fibonacci heaps. I decided to use the Binary heap this time, though.

I had to write some wrapper code to get it working. I decided against writing automated tests before I wrote the code (= "Test-Driven Development") and just write the code and get it to working, because I found that I didn't exactly know what it would generate. Anyway, I now have the script working.

I encountered some problems in trying to find how to render just one fortune element and not the entire file. After not finding anything about it in the XML::LibXSLT documentation, I opted to tweak the XSLT stylesheet and create an optional parameter for it to act upon.

Eventually, I got an Atom feed. Firefox displays it fine, but it doesn't validate. Part of the problem is that I needed to trim down the feed entries' content from the "<html>" and other such tags. But another issue is that XML-Feed does not support all the required Atomisms.

I've noticed both XML-Feed and XML-atom have suffered from neglect (including the lack of support for those Atomisms), and decided to adopt them. I set up a repository for XML-Feed on BerliOS and started to get into it. First thing I did was get rid of ExtUtils::AutoInstall, and I have also given repository access to another person who is interested in revamping them. Thanks to BingOS for helping me with some Module::Install-ism.

As it turns out, the XML-LibXSLT bug was actually a Cooker-specific bug, which seems like it is caused from problematic compilation flags.

So world-domination through Unix-like fortune cookies is still making progress. I'll probably release XML-Grammar-Fortune in its current form soon since the smash stack I got is highly system-dependent.